SLIP
Research
Note I
November 27, 2001
Obtaining Informational Transparency with Selective Attention
Dr. Paul S. Prueitt
President, OntologyStream Inc
November 27, 2001
Research Note I
November 27, 2001
One needs the zip file amSLIP.zip to follow this Research Note.
Over the past few weeks I have been concerned with now the structure of a cluster might be represented graphically. A number of issues are involved.
1) The completeness and consistency of the graphical representation of a category (see Figure 3)
2) The size of the category or the size of various report results. The size may make understanding the relationships difficult due to the amount of data
3) Results from other data mining techniques. There are other data mining techniques that can supplement the SLIP non-specific relationship
Visualization and cognitive aids (sense-making and decision-making aids) are essential to all three issues.
Future research notes will address the issues of size and the supplementation from other data mining.
Cylant IDS data is currently being studied as a supplement to the SLIP technology.
Software development issues:
1) Completing the SLIP Warehouse Browser so that all input files (paired.txt and Datawh.txt) that are required by the SLIP Technology Browser version 2.2.0 are made available.
2) Reviewing and developing the non-database processes using Referential Information Base (RIB) techniques.
This Research Note addresses the question of where we are in our project.
I am looking to complete an operational system based on SLIP technology and related technologies, in support of Information Assurance. The estimate for the completion of this system is 3 – 6 months. A proof of concept can be made based on the software that I have currently have finished.
Much of my time this week has been spent refining the ability to generate reports, sort the display of the report, organize the information in the report, and visualize event properties. I have some preliminary work to show, and some design work. I am also making small corrections in the software code.
Figure 1: Version 2.2.0 with the help file
The current sort function sorts by column but the sort is based on ASCII order. As I complete the response object, the user will see additional response comments so that the user can understand what has happened when the command “sort n” is executed (where n is between 1 and the total number of columns.)
Figure 2: Sort function
The determinants of the current Report are under constrained. This is as it should be. Additional constraints are generally imposed by SQL clauses. However, the data itself is incomplete. Without some top down expectancy, then the predictive potential from the SLIP technology will NOT be available.
The current expectancy is problematic for both human experts and artificial intelligence. It is this problem that the SLIP Technology is addressing most directly.
I am refining some of the background processes, particularly related to the development of event maps (see Figure 4). These processes are producing a predictive capability.
The current Report file has some records that is not relevant to the category since the current retrieval is only using the atoms, one at a time, and delivers to the report file each record in the Datawh.txt that matches the designated column value with the value of the atom.
In additional to the conceptual work that I do, and the theoretical work, and the software design and coding, there are some current technical issues. I work these issues continuously. For example, in the current Report file, some of the values do not have a “standing” in the category.
By this, one means that the theoretical issue of completeness and consistency has yet to produce algorithms that filter the current report. The theory is clear, but the algorithm is not yet completed. Once the filter is in place, then the software is just more interesting to the user. The theory never has to be understood by the user. But the software must reflect a grounded theory of knowledge and knowledge management.
Let us look at some detail. If a record is in the Report then it is in the Report because of a computed match to an atom value. The theory tells us that the relationship between two atoms is why the category exists. The theory also tells us that given any two atoms in the record, there should be one or more “b” values that have established that relationship. If this “b’ value is not relevant to the global event map, then the record should be eliminated from the Report. But how to do this?
The answer to this question is clearly made in the presentations that I have delivered on a global stratified taxonomy. The answer MUST BE in the development of a common language used by the human community AND must be related to formal constructs that reflect SLIP data aggregation and other data mining processes. Why can we be so sure of where the answers MUST come from? The theory tells us this, and given the experimental trial use of the software, the domain experts will support the same conclusions that I have derived from SLIP theory.
There must be a human element, based on a common taxonomy of terms, to our solution AND a technological solution based on abstractions. These abstractions are NECESSARY to encode the patterns of invariance that occur at the (1) bit stream level and (2) the summary results from Intrusion Detection Systems such as RealSecure or Cylant. The abstractions may present a barrier to using the technology, but this barrier can be overcome easily IF we use the Browsers to examine real data that the domain expert is interested in. The user will quickly adopt the necessary abstractions and start to use the event maps as a means to reason about and share knowledge about the types and classes of events that are occurring in the Internet, in real time.
A technical note should be made at this point. The use of different data sources is vital if the SLIP technology is to be flexible and general in nature. In Figure 3 we see the trace of the event record,
[1] 991024980 12014 0 s21794 d629 207.172.106.87 208.205.160.42 tcp
(Actually the records occurs three times.). Other records occur three time and we have to figure out why the records are retrieved more that once. Both the FoxPro code and the VB code generate exactly the same Report. This is likely a process design flaw or something about the Cylant data set.
The current focus on one RealSecure data set only attends to part of the available data sources for the SLIP Technology. The development team must have a richer data set and more contact with the domain experts.
Figure 3: The extra records in category C1
Returning to the technical detail we can see that much has been gained during our IRAD project. But significant work needs to be made to make the technical detail seamless.
Using the Cylant data set, we have an example of a record that should not be reported at all or should be reported as an “external fact”. Again, due to the well-developed theory, we know that each atom has a set of non-specific relationships that are determined by a common “b” value. The theory is well developed in formal theorems in my collaboration with several mathematicians and is well developed in algorithmic specification in my collaboration with computer scientists. Similar results can be created using the RealSecure data.
I concluded that if there is a non-specific relationship, to atoms NOT in a specific cluster, then this relationship must be seen as an “external” valance. I then evolved the software in the direction of showing two categories of relationships between atoms and atom compounds, the external relationship and the internal relationship. (see Figure 4). The concept of valence is well developed in my published work over a period of ten years.
The concept of valance suggests that a chemistry of event composition can be reflected in the event maps. As the event maps are shared with a secure community, one expects that the maps will facilate rapid communication within the community. The event maps also can be used to automate the construction and use of Petri Nets and rule bases.
The automated construction of Petri nets and rule sets has not been tested but I have explored privately with a PhD candidate at University of Idaho the process whereby the SLIP and Petri Net technology can be integrated. My discussions within my close community are always made with an agreement to professional confidentiality and with limited sharing of knowledge about my relationship to my clients. The purpose of these discussions is to know where the leading edge is in detection technology and vulnerability analysis.
Let us return to a discussion about event maps. Consider category C1, (available in amSLIP.zip.) The six atoms all have a chaining relationship; we know this because cluster C1 is prime (all atoms quickly move to the same location.) The chain is in fact represented as the set of ordered triples (called “syntagmatic units” in semiotic theory):
{ <d941, s48745, d900>, <d900, s48745, d790>,<d790, s48745, d780>,<d780, s48745, d629>,
<d629, s48745, d1418> }
A transitive relationship, a * b and b * c à a * c allows us to indicate 6 links (in red) as the above set or as the fully enumerated set with 24 links (in red and blue)
Figure 4: The event map for category C1
After [1] is removed and the duplicates are removed we have the following Report for category C1.
989615966 24007 0 s48745 d941 208.205.160.42 208.205.160.42 tcp
989615951 23988 0 s48745 d900 208.205.160.42 208.205.160.42 tcp
989615969 24011 0 s48745 d790 208.205.160.42 208.205.160.42 tcp
989615958 23996 0 s48745 d780 208.205.160.42 208.205.160.42 tcp
989615955 23992 0 s48745 d629 208.205.160.42 208.205.160.42 tcp
989615960 23999 0 s48745 d1418 208.205.160.42 208.205.160.42 tcp
The event map in Figure 4 is hand drawn from data that is now derived from my FoxPro programs and data structures. I have almost redesigned a data aggregation process so that these drawing is automatically produced in the SLIP Technology Browser and viewed as indicated in Figure 5.
Figure 5: Mock-up of how the event maps will be viewed using the Technology Browser
Clearly, the event maps will characterize global events and provide a common means to discussion these events. In Figure 5 I look ahead to how the event maps will be automatically generated and displayed using the SLIP Technology Browser. I am developing this work as rapidly as I possibly can.
A second event map
In category E2, we have the following set of syntagmatic units:
{ <d520, s1024, d3130>, <d3130, s1024, d161>, <d161, s1024, d520 > }
and also
{ <d3128, s2417, d568> }
One can see the two clusters by looking in the Members window.
There are three atoms that are at degree 306-307 and the two atoms that are at degree 156 (Figure 6). By manually looking into the report we find the source ports that binds the links together (Figure 7). Anyone that has the zip file (686 K) amSLIP.zip can click on category E2 and magnify to 200 by typing “mag 200” in the command line.
Figure 6: Clustered patterns for a small group of 10 atoms
We will always be able to draw an event map (like Figure 4 or Figure 7) for any category. Event map representation can be automated for even the full set of atoms in A1, thus giving a complete day’s data set, from any IDS, one stop visualization.
Figure 7: Event map for category E2
Because of event-map automation, the concept of an automatically produced SLIP Framework is even more interesting. The ending nodes of an automatically produced Framework will each have the character of Figure 4. Common and recognizable graphic patterns are easily recognized. Complete visual summarization for a day or week or even a month can be produced, saved and printed.
The event map (Figure 7) shows a link to port 80 from both the { 3139,161,520} cluster and the { 3128,568} cluster. By looking at Figure 8 one can follow how the category E2 was produced. The large cluster in Figure 8 contains two ports; 80 and 113, that together links the entire large cluster together (all 72 atoms).
I moved the large cluster into the category B1, and then removed just these two atoms manually so that the clustering process was fractured. We can now see the structure of links with d80 and d113 not in the category. Later after finding the two small primes (in Figure 7) I checked manually to find that in fact the two primes are related to d80. This is indicated as an external link (dotted line). This means that the two primes (prime at the E level) are both part of a larger prime at the B level.
The event map in Figure 6 shows a non-specific relationship that links together two small groups of defensive ports (because a common s_port is used to access both port groups during the time of the global event). This is Cylant data and I do not have any idea what the global event was.
Figure 8: Cluster pattern for the Cylant data set (amSLIP.zip)
In Figure 8 we have the top node of the Cylant data set. The Cylant data is taken from a behavioral study of the Linux kernel on a computer that is connected to the Internet as a type of HoneyNet.
Figure 9: The mock up for the SLIP Warehouse Browser
The screen in Figure 8 shows the analytic conjecture for the Cylant data. The analytic conjecture is that d_port are related by having a common s_port.
The
drill down
One of our sources of data is a RealSecure
event log from April 15th. I have
around 14,000 RealSecure IDS records where each of these records is a signature
produced in response to some event defined by the IDS as an intrusion
event. I also have around 68,000
records from a Cylant IDS log file.
Any one of the SLIP prime categories will
identify a small subset of these 14,000 records, AS WELL AS create an abstraction
based on categorization of a number of similar RealSecure intrusion
events. Each of these RealSecure
intrusion events is a single event in the 14,000 records. The abstraction may be something like an
event that involves port 80 and port 113, as seen in Figures 4 and 7.
This abstraction has various uses.
1)
The abstraction
is a query that can be used against the original data OR against new data.
2)
The abstraction
codifies a pattern of data that occurs more than once. The pattern involves more that one
RealSecure event record, AND the pattern occurs in more than one more globally
defined situation. Abstractions are
easy to use once the context of the abstraction is made real in specific
situations. For example, we use
abstractions when we count; one, two, three, etc. Rendering the SLIP abstractions as event maps makes these
abstracts very easy to see and use.
3)
The abstraction
is one of a small class of abstractions that when talked about between analysts
results in a selective attention to real patterns of occurrence. The patterns and the abstractions can be
used to understand events that are more global than what the event log data
source is recording.
The drill down into a pre-defined event can
occur in at least two ways:
1)
The pre-defined
event is something that has been identified by a CERT as something that
occurred on April 15th 2001 (for example).
The SLIP software is NOW developed to the point of being usable in
constructing a set of abstractions (for example about RealSecure data patterns)
that will illustrate the nature of this, or ANY, pre-defined event. We have not been spending the time that is
required to do this because the client is busy with current events. Thus I have been developing the software so
that analysts can use the tools in a secure environment on real time event
analysis. So any new data source can
NOW be used to develop an understanding of the patterns of data and the
rendering of each of the patterns as an abstraction with a simply visualization
as an event map. My examination of the Cylant data illustrates this
generality.
2) Each pre-defined event (we have only one pre-defined event provided to me) will produce a small set of abstractions. Again these abstractions are patterns of data occurrence. Given a pattern there are more than one instance of the pattern, and thus the pattern itself IS an abstraction.
The event maps gives a simple visualization
of these patterns.
Once one has a number of event maps then these
event maps can be used to retrieve (using standard query language) all records
that have the pattern of relationships in the event map composition. This retrieval capability is available for
audit trail of any pre-defined event, as well as for prediction of the
potential occurrence of a similar (similar to the pre-defined event) event from
incomplete RealSecure data records (in real time).
<atom = 3128,
count =10>
<2417 1163 46819 41299 4511 1706 1711 1708 10246 1305 >
<atom = 3130,
count =2>
<1024 2403 >
<atom = 161,
count =4>
<1025 1024 1393 10010 >
<atom = 520,
count =1>
<1024 >
<atom = 568,
count =1>
<2417 >
<atom = 80,
count =103>
<37604 37625 3270 62556 62577 62584 2415 3917 62781 62978 63008 63031 63149 61241 64244 48627 1126 12073 63557 63603 63605 36319 63624 63633 63653 63663 1982 63753 2215 4074 63658 63669 2370 1144 1303 63925 1415 1313 1315 1321 1323 1325 1327 64098 64013 1336 1338 4743 3394 64577 64627 64629 13687 65051 3065 61357 61649 56444 4676 61991 62231 62237 62353 62388 2855 2858 1509 3955 62523 2861 62609 62783 3936 1095 4537 4539 63227 4487 1840 34084 1673 62851 63253 1391 4427 27132 3426 1224 42546 1125 3016 10769 10915 22886 4915 10711 6980 3993 1098 1099 33718 2639 1186 >
A “show atom” command and a “show graph” command will soon allow the user to see the types of graph and graph components that is hand drawn in Figure 10.
Figure 10: The specification of both atoms and vent maps